[WIP] Add dynamic scheduling operation mode#271
Conversation
|
This still fails some tests but after switching away from |
aspiers
left a comment
There was a problem hiding this comment.
Thanks for accepting my suggestion. As per this comment this doesn't seem to work for me yet.
This commit adds an opt-in scheduler option for dynamic scheduling. Instead of partitioning the test list up-front based on historical timing data this commit lets each worker ask for the next test dynamically. This is built using python's multiprocess module to launch new workers instead of shelling out to call python via subprocess. This hopefully will provide a better worker balance since we will keep each worker occupied until there are no more tests to be run. Instead of trying to pack fill each work optimially up front. Additionally this should hopefully improve the pdb story for users who use pdb with tests. Since instead of spawning subprocesses calling python to invoke the subunit runner and reading the subunit stream from stdout and instead uses multiprocessing to fork workers and uses pipes to pass the subunit streams between workers.
Co-Authored-By: Adam Spiers <github@adamspiers.org>
This commit fixes the failing tests by catching a couple of missing things from the update. The biggest fix was that for the --no-discover case we still use a subprocess and because of that we need to tell output.ReturnCodeToSubunit to that the input is not dynamic (and therefore a Popen object) so it can handle that properly. The other major change is that the return code tests are updated so that the stdout and stderr from the subprocess calls are always decoded in the non-subunit test cases. This was done primarily for ease of debugging, but it also enabled the removal of several decode() calls when the output is parsed.
This is a refinement on the previous commit to reduce unecessary changes to the functional tests in the test_return_codes module. Mainly always decoding the output from the subprocess for testing broken things unexpectedly when a bytes object was expected.
8933610 to
e68b506
Compare
Conflicts: stestr/commands/run.py stestr/output.py stestr/test_processor.py stestr/tests/test_return_codes.py
I originally developed this feature when we still supported older python versions in stestr. The dynamic scheduling feature depends on functionality added in Python 3.5. Since then the WIP feature branch sat stale for years since that time we've bumped the minimum version of Python supported to 3.7 so the runtime check for older python versions is no longer needed.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #271 +/- ##
==========================================
- Coverage 61.42% 59.74% -1.68%
==========================================
Files 30 30
Lines 2613 2703 +90
Branches 404 421 +17
==========================================
+ Hits 1605 1615 +10
- Misses 889 964 +75
- Partials 119 124 +5
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
This commit fixes an issue that occured in earlier commits on the PR around the initialization of the worker processes and the scope of the launch method. Previously if the method used to launch threads returned before all the workers accessed the queue for the first time the worker wouldn't be able to read from the queue. This race condition was caused because the Queue was locally scoped to the method and would be deleted by the main process before other workers could read it. This would specifically occur on systems using "forkserver" or "spawn" multiprocessing start methods because the child processes didn't have the queue object, while "fork" would because the process memory was copied in the child process. This commit fixes this by scoping the Queue object to the instance which means it survives as long as the test processor object does (which is typically the entire run command). As part of this change the start method used by the new dynamic scheduler is set to be fixed to "spawn" to minimize any potential interactions between stestr and the code under test. This mirrors the behavior of running in non-dynamic scheduler mode, because spawn is roughly equivalent to calling python in a subprocess.
This commit improves the documentation of the new --dynamic flag to explain how it operates and what the goal of it is. It also makes it clear the feature is experimental and is an opt-in at your own risk. Also from testing this doesn't currently work on Windows, instead of blocking the feature over a platform used by 2-3% of our users (according to https://pypistats.org/packages/stestr ) this just marks it as currently unsupported. We will have to revisit how to make this work on Windows before we stabilize the feature.
|
Under Python 3.13, I get: |
This commit adds an opt-in scheduler option for dynamic scheduling.
Instead of partitioning the test list up-front based on historical
timing data this commit lets each worker ask for the next test
dynamically. This is built using python's multiprocess module to
launch new workers instead of shelling out to call python via
subprocess.
This hopefully will provide a better worker balance since we will keep
each worker occupied until there are no more tests to be run. Instead
of trying to pack fill each work optimially up front. Additionally this
should hopefully improve the pdb story for users who use pdb with tests.
Since instead of spawning subprocesses calling python to invoke the
subunit runner and reading the subunit stream from stdout and instead
uses multiprocessing to fork workers and uses pipes to pass the subunit
streams between workers.